1 

New Models for the Correlation in Sensor Data 



Samar Agnihotri 

Centre for Electronics Design and Technology 
Indian Institute of Science, Bangalore-560012, India. 
Email: samar@cedt.iisc.ernet.in 



Abstract 

In this paper, we propose two new models of spatial correlations in sensor data in a data-gathering sensor network. A particular 
property of these models is that if a sensor node knows in how many bits it needs to transmit its data, then it also knows which 
bits of its data it needs to transmit. 

I. Introduction 

In the past, [1] and [2] have proposed explicit models of spatial correlations in the sensor data. However, these models are 
not quite suitable in the context of the spatial correlations in the sensor data in the real data-gathering sensor networks, as the 
model proposed in [1] is impractical and the model proposed in [2] is computationally very intensive. Also, this latter model 
only gives the average number of bits transmitted by a node under a sensor polling schedule, more precisely, the differential 
entropy of a sensor node conditioned on the data of the nodes which have akeady transmitted their respective data. So, this 
model is not suitable if one wants to compute the number of bits transmitted by a node in the worst-case. Based on these 
limitations of the aforementioned models and the lack of any other non-trivial, practical model of spatial correlation in sensor 
data, in this work we propose two such models which are computationally simple, yet capture well our intuition of the spatial 
correlations in the sensor data. Also, these models can easily be used to compute both, the average and worst case number of 
bits transmitted by a node under a transmission schedule. 

II. New Models of Spatial Correlation 

Let Xi be the random variable representing the sampled sensor reading at node i S {1, . • . , A^} and B{Xi) denote the 
number of bits that the node i has to transmit. Let us assume that each node i has at most n number of bits to transmit, so 
B{Xi) = n. However, due to the spatial correlation among sensor readings, each sensor may send less than n number of bits. 
Let dij denote the distance between nodes i and j. 

Model 1: Let us define B{Xi/Xj), the number of bits that the node i has to transmit when the node j has already 
communicated its data, as follows: 

B{xjx,) = l «iKn if«ir<n 

[ ri otherwise, 

where the parameters ai,/3i £ M, ai > 0, take care of the various application specific correlation effects. Figure [T] illustrates 
this for n = 5, ai — 1.0, /3i = 1.0. Here, it should be noted that when the node j has already transmitted its data, then the 
node i transmits no more than B{Xi/Xj) bits of its n bit data and we define these B{Xi/Xj) bits to be the least significant 
B{Xi/Xj) bits of its n bit data. So, if the node i knows in how many bits it needs to transmit its data, then it also knows 
which bits of its data it needs to transmit. Here we do not concern ourselves with how a node comes to know of in how many 
bits it should transmit its data. This discussion is beyond the scope of the present work and is discussed elsewhere [3]. 
From the definition above in ([T]i, follows the symmetry of the conditional number of bits: 

B{X,/Xj)^B{X,/X,) (2) 

However, the definition of the correlation model is not complete yet and we must give the expression for the number of 
bits transmitted by a node conditioned on more than one node already having transmitted their bits. There are several ways in 
which this quantity can be defined. Here we have chosen to define it in the following two ways: 

BiXjXi, . . . = min B{XjXj) (3) 

i<j<i 

B(X,/Xi, . . . , = max B{X^/Xj) (4) 

1<3<1 

So, according to equation ([3]), the number of bits transmitted by node i depends only on its nearest neighbor among all the 
nodes which have already communicated their data and according to (|4|i, it depends only on the farthest neighbor among all 
the nodes which have already communicated their data. 

Let iS be the set of nodes which have already transmitted their data. The rational behind the definition in ([3]) is that the 
sampled reading of a node is most correlated with the reading of its nearest neighbor in set S. So, if two nodes are spatially 



2 



3" 



E 
Z 



Distance between nodes i and j: d- 



Fig. 1. First Data Correlation Model for n = 5: plot of B{Xi/Xj) defined in eqn (T) versus dj. 



close, then their data is most likely to differ only in the least significant bits. Similarly, the intuition behind the definition in (|4| 
is that the sampled reading of a node is least correlated with the reading of its farthest neighbor in the set S. So, the number 
of bits that a node has to transmit conditioned only on its farthest neighbor in set S, gives the upper bound on the number of 
bits that the particular node has to transmit for the given set S. 

Note that when the nodes transmit their data according to some polling schedule tt, then ([T]) denotes the number of bits 
transmitted by the node 7r(i) when the node 7r(j) has already transmitted its data. B{Xi/Xi, . . . , in (O and ^ should 

be interpreted similarly. Also note that for the correlation models in (O and (01, the sum of the number of bits transmitted by 
all the nodes depends on the transmission schedule according to which the nodes transmit their data. 

Model 2: Let us generalize the previous model of spatial correlation in sensor data and define B(Xi/ Xj), the number of 
bits that the node i has to transmit when the node j has akeady communicated its data, as follows: 



B{X,/X^) = [n(l - a-2e 



)1, 



(5) 



where the parameters a2 , /32 S 
this for n — 5,a2 = 1-0, /?2 = 



i,a2 > 0, take care of the various application specific correlation effects. Figure |2] illustrates 
1.0. Once more, it should be noted that when the node j has already transmitted its data. 



then the node i transmits no more than B{Xi/Xj) bits of its n bit data and we define these B{Xi/Xj) bits to be the least 
significant B{Xi/Xj) bits of its n bit data. So, if the node i knows in how many bits it needs to transmit its data, then it also 
knows which bits of its data it needs to transmit. 



Distance between nodes i and j: dy 
Fig. 2. Second Data Correlation Model for n = 5: plot of B{Xi/Xj) defined in eqn (5) versus dij 



For the small values (in magnitude) of the exponent on the right-hand side and a2 — 1.0, the correlation model in ^ 
reduces to the correlation model in ([T]i, if we identify ai = P2 and Pi — 2. So, the model in ([T]) is the linear approximation 
of the model in (|5]). 
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From the definition above in (|5]l, follows the symmetry of the conditional number of bits: 

B{X^/Xj) - B{Xj/X{) (6) 

Now, let us define the number of bits transmitted by a node conditioned on more than one node already having transmitted 
their bits as: 



(7) 



B{Xi/ Xi, . . . , Xi^i) denotes the maximum number of bits that the node i transmits, given that the nodes from set S = 
{1, . . . , i — 1} have already transmitted their data. 

The intuition behind the above model of correlation is that the number of bits that a node has to transmit, with the nodes 
in the set S having already transmitted their data, should be the (weighted) average of number of the bits that the particular 
node has to transmit conditioned on all the node in the set S individually. The choice of the exponential dependence on the 
internode distance is based on the Gaussian correlation model proposed in [2]. 

Note once more that when the nodes transmit their data according to some polling schedule tt, then (|5]l denotes the number 
of bits transmitted by the node 7r(z) when the node 7r(j) has already transmitted its data. B{Xi/Xi, . . . , Xi^i) in O should 
be interpreted similarly. Also note that for the correlation model in O, the sum of the number of bits transmitted by all the 
nodes depends on the transmission schedule according to which the nodes transmit their data. 
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